Concept Drift Detection Using Online Histogram-Based Bayesian Classifiers
نویسندگان
چکیده
In this paper, we present a novel algorithm that performs online histogram-based classification, i.e., specifically designed for the case when the data is dynamic and its distribution is non-stationary. Our method, called the Online Histogram-based Naı̈ve Bayes Classifier (OHNBC) involves a statistical classifier based on the well-established Bayesian theory, but which makes some assumptions with respect to the independence of the attributes. Moreover, this classifier generates a prediction model using uni-dimensional histograms, whose segments or buckets are fixed in terms of their cardinalities but dynamic in terms of their widths. Additionally, our algorithm invokes the principles of information theory to automatically identify changes in the performance of the classifier, and consequently, forces the reconstruction of the classification model in runtime as and when it is needed. These properties have been confirmed experimentally over numerous data sets from different domains. As far as we know, our histogram-based Naı̈ve Bayes classification paradigm for time-varying datasets is both novel and of a pioneering sort.
منابع مشابه
Concept Drift Detection Using Online Bayesian Classifier
In data classification the goal is to predict the category of novel instances based on a collection of exemplars whose respective categories are known a priori. The state-of-theart includes various algorithms to solve this problem, including Naive Bayes, Random Forest, Support Vector Machines (SVM), among others. Most of these classifiers consider that the statistical data distribution remains ...
متن کاملDetecting Concept Drift in Data Stream Using Semi-Supervised Classification
Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...
متن کاملAn Adaptive Prequential Learning Framework for Bayesian Network Classifiers
We introduce an adaptive prequential learning framework for Bayesian Network Classifiers which attempts to handle the costperformance trade-off and cope with concept drift. Our strategy for incorporating new data is based on bias management and gradual adaptation. Starting with the simple Näıve Bayes, we scale up the complexity by gradually increasing the maximum number of allowable attribute d...
متن کاملAdaptive learning algorithms for Bayesian network classifiers
This thesis is concerned with adaptive learning algorithms for Bayesian network classifiers (BNCs) in a prequential (on-line) learning scenario. Online learning is particular relevant since in many applications learning algorithms act in environments where the data flows continuously. An efficient supervised learning algorithm in dynamic environments must be able to improve its predictive accur...
متن کاملAdaptive Bayesian network classifiers
Abstract This paper is concerned with adaptive learning algorithms for Bayesian network classifiers in a prequential (on-line) learning scenario. In this scenario, new data is available over time. An efficient supervised learning algorithm must be able to improve its predictive accuracy by incorporating the incoming data, while optimizing the cost of updating. However, if the process is not str...
متن کامل